[RFC] commit-reach: skip STALE drain when only one merge-base needed#2109
[RFC] commit-reach: skip STALE drain when only one merge-base needed#2109spkrka wants to merge 1 commit intogitgitgadget:masterfrom
Conversation
Welcome to GitGitGadgetHi @spkrka, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests. Please make sure that either:
You can CC potential reviewers by adding a footer to the PR description with the following syntax: NOTE: DO NOT copy/paste your CC list from a previous GGG PR's description, Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:
It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code. Contributing the patchesBefore you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form Both the person who commented An alternative is the channel Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment If you want to see what email(s) would be sent for a After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail). If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the curl -g --user "<EMailAddress>:<Password>" \
--url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txtTo iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description): To send a new iteration, just add another PR comment with the contents: Need help?New contributors who want advice are encouraged to join git-mentoring@googlegroups.com, where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join. You may also be able to find help in real time in the developer IRC channel, |
|
There is an issue in commit a43f270:
|
a43f270 to
09c61b0
Compare
|
There is an issue in commit 09c61b0:
|
09c61b0 to
50f4c7d
Compare
derrickstolee
left a comment
There was a problem hiding this comment.
Meta comments:
- You can use your PR description to introduce yourself as a new contributor. It will not impact the commit message that is stored in the final repository, but helps folks get to know you.
- I like your custom tests. I'd be interested to know if any performance tests in t/perf/ would help. You can
cd t/perf/ && ./run HEAD~1 HEAD -- p<num>-<name>.sh. However: I checked the existing tests and I don't think that any will actually show meaningful results in the way your manual test does.
|
/allow |
|
User spkrka is now allowed to use GitGitGadget. WARNING: spkrka has no public email address set on GitHub; GitGitGadget needs an email address to Cc: you on your contribution, so that you receive any feedback on the Git mailing list. Go to https://github.com/settings/profile to make your preferred email public to let GitGitGadget know which email address to use. |
50f4c7d to
54cdf9b
Compare
|
/preview |
|
Preview email sent as pull.2109.git.1778252509243.gitgitgadget@gmail.com |
|
/submit |
|
Submitted as pull.2109.git.1778252837132.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
This patch series was integrated into seen via git@0b5178e. |
|
This patch series was integrated into seen via git@b7cd8e6. |
|
This branch is now known as |
|
This patch series was integrated into seen via git@f055420. |
|
Junio C Hamano wrote on the Git mailing list (how to reply to this email): "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> diff --git a/commit-reach.c b/commit-reach.c
> index d3a9b3ed6f..c9d2d594de 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -55,14 +55,16 @@ static int paint_down_to_common(struct repository *r,
> struct commit **twos,
> timestamp_t min_generation,
> int ignore_missing_commits,
> + int find_all,
> struct commit_list **result)
> {
> struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
> int i;
> + int has_gens = min_generation || corrected_commit_dates_enabled(r);
Giving a name that identifies what the commonly-used logical
expression means is a very good idea, but don't some caller pass
min_generation==infinity (i.e., not zero) when there is no
generation available? E.g., remove_redundant_no_gen() passes
the result of calling commit_graph_generation(), and without commit
graph, we would get infinity here, right?
Given that the second user of this variable is guarded by !find_all,
the variable being true even when we shouldn't be using generation
numbers may not matter for the purpose of the early break, but then
it means that the patch squanders the perfect opportunity to clarify
what the variable means, which is not very lovely.
> timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
> struct commit_list **tail = result;
>
> - if (!min_generation && !corrected_commit_dates_enabled(r))
> + if (!has_gens)
> queue.compare = compare_commits_by_commit_date;
>
> one->object.flags |= PARENT1;
> @@ -97,6 +99,11 @@ static int paint_down_to_common(struct repository *r,
> if (!(commit->object.flags & RESULT)) {
> commit->object.flags |= RESULT;
> tail = commit_list_append(commit, tail);
> + /* Generation-ordered queue: no later
> + * commit can dominate this one. */
> + if (!find_all && has_gens &&
> + generation < GENERATION_NUMBER_INFINITY)
> + break;
Three comments
* See Documentation/CodingGuidelines for our preferred style for
multi-line comments.
* I do not think we often use "dominate" to describe relationship
between commits, and I am not sure what you want the word to mean
in this context. Can you clarify?
* The code is getting way too deeply indented. Perhaps using a
helper function and
if (!(commit->object.flags & RESULT)) {
if (mark_result(r, &tail, commit,
find_all, min_generation))
break;
}
move the logic to mark the newly discovered commit as one of the
possible result, and to tell the loop to terminate, to it, along
with the comment on the meaning of its return value, may make the
code easier to follow?
> +/* To be used only when object flags after this call no longer matter.
> + * When find_all is false and generation numbers are available, returns
> + * after finding the first merge-base, skipping the STALE drain. */
Ditto on the style. |
|
There was a status update in the "New Topics" section about the branch "git merge-base" optimization. Comments? source: <pull.2109.git.1778252837132.gitgitgadget@gmail.com> |
54cdf9b to
43c6bdf
Compare
|
This patch series was integrated into seen via git@41a7672. |
43c6bdf to
f7b5c26
Compare
|
/submit |
|
Submitted as pull.2109.v2.git.1778480348118.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
Patrick Steinhardt wrote on the Git mailing list (how to reply to this email): On Mon, May 11, 2026 at 06:19:08AM +0000, Kristofer Karlsson via GitGitGadget wrote:
> diff --git a/commit-reach.c b/commit-reach.c
> index d3a9b3ed6f..b4ca00bb7e 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -165,7 +175,7 @@ static int merge_bases_many(struct repository *r,
> oid_to_hex(&twos[i]->object.oid));
> }
>
> - if (paint_down_to_common(r, one, n, twos, 0, 0, &list)) {
> + if (paint_down_to_common(r, one, n, twos, 0, 0, find_all, &list)) {
> commit_list_free(list);
> return -1;
> }
Callsites like this are quite hard to read now with these boolean flags.
Would it be preferable to instead use flags?
enum paint_down_to_common_flags {
PAINT_DOWN_TO_COMMON_IGNORE_MISSING_COMMITS = (1 << 0),
PAINT_DOWN_TO_COMMON_FIND_ALL = (1 << 1),
};
It's more verbose of course, but that's kind of the point.
Only weirdness is that we don't only accept these flags in
`paint_down_to_common()`, but also in other functions that pass those
flags down.
Patrick |
|
User |
|
This patch series was integrated into seen via git@dabfe27. |
Commits not in the commit-graph get GENERATION_NUMBER_INFINITY and sort to the top of the priority queue. After those, commits with finite generation numbers are popped in non-increasing order. When MERGE_BASE_FIND_ALL is not set the first doubly-painted commit with a finite generation is therefore a best merge-base: no commit still in the queue can be a descendant of it. Skip the expensive STALE drain in this case. Introduce enum merge_base_flags with MERGE_BASE_FIND_ALL and MERGE_BASE_IGNORE_MISSING_COMMITS, replacing the two boolean parameters in paint_down_to_common(). Thread the flags through merge_bases_many(), get_merge_bases_many_0(), and the public repo_get_merge_bases_many_dirty() API. git merge-base (without --all) passes 0, triggering the early exit. On a 2.2M-commit merge-heavy monorepo with commit-graph: HEAD vs ~500: 5,229ms -> 24ms HEAD vs ~1000: 4,214ms -> 39ms HEAD vs ~5000: 3,799ms -> 46ms HEAD vs ~10000: 3,827ms -> 61ms Signed-off-by: Kristofer Karlsson <krka@spotify.com>
f7b5c26 to
e4dada8
Compare
Context for what this is all about.
I am working with a very large git monorepo and have been investigating performance issues. After some digging I ended up looking more deeply into git merge-base. I saw it had an --all parameter but the default is to only return a single merge-base. Looking through the code and adding debug timing, I realized that although the total time to compute the merge-base was high, a very small amount of time was spent finding the initial merge-base value that was later returned.
The optimization is actually quite dramatic in a large repo - runtime went down from 5000ms to 50ms, so it's roughly a 100x optimization. This comes from an exploding frontier of STALE commits to drain.
Thus, my idea is simply to return early from the function once we know what will be returned. This only works if we find a candidate that we know will not be pruned later - but fortunately if we have a commit graph with generations we will visit commits in order such that it will actually not be pruned.
CC: Derrick Stolee stolee@gmail.com
Changes since v1 (thanks Junio for the review):
Dropped the
has_gensvariable entirely. If a commit has afinite generation then it is in the commit-graph, and so are
all its ancestors — no additional check is needed to know
the queue ordering is sound. Without a commit-graph every
commit gets INFINITY and the guard never fires. This also
avoids the misleading interaction with callers that pass
non-zero min_generation without having generation data.
Simplified the early exit guard from three conditions to two:
!find_all && generation < GENERATION_NUMBER_INFINITY.Fixed multi-line comment style per CodingGuidelines.
Replaced "dominate" with concrete reasoning about queue ordering.
Did not extract a helper function: after the simplifications above
the inner block is four lines and reads naturally inline. The
right boundary for a helper is not obvious (it could absorb just
the result marking, or also the RESULT flag check, or also the
PARENT1|PARENT2 test) and each level requires more local state
passed by pointer. Happy to extract one if preferred.
Changes since v2 (thanks Patrick for the suggestion):
Replaced the boolean
find_allandignore_missing_commitsparameters in paint_down_to_common() with a single
enum merge_base_flags mb_flags, reducing the function from8 to 7 parameters. The enum is defined in commit-reach.h
with MERGE_BASE_FIND_ALL and MERGE_BASE_IGNORE_MISSING_COMMITS.
Named the enum
merge_base_flagsrather thanpaint_down_to_common_flagssince the flags express callerintent and are threaded through multiple layers including the
public repo_get_merge_bases_many_dirty() API.
Used
mb_flagsas the parameter name to avoid shadowing theexisting local
int flags(commit object flags) insidepaint_down_to_common().
cc: Patrick Steinhardt ps@pks.im